visual studio magazine
Building a better AI data engine -- Visual Studio Magazine
AI practitioners regularly face a few common challenges: too much time spent building and maintaining tools and infrastructure, siloed AI development efforts, and fragmented processes to evaluate quality. We believe that designing a workflow that optimizes for automation and iteration can lead to more accurate data models, faster implementation and up to 70% cost savings.
How to Fine-Tune a Transformer Architecture NLP Model -- Visual Studio Magazine
The goal is sentiment analysis -- accept the text of a movie review (such as, "This movie was a great waste of my time.") This article describes how to fine-tune a pretrained Transformer Architecture model for natural language processing. More specifically, this article explains how to fine-tune a condensed version of a pretrained BERT model to create binary classifier for a subset of the IMDB movie review dataset. The goal is sentiment analysis -- accept the text of a movie review (such as, "This movie was a great waste of my time.") You can think of a pretrained transformer architecture (TA) model as sort of an English language expert.
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
Computing the Similarity Between Two Machine Learning Datasets -- Visual Studio Magazine
At first thought, computing the similarity/distance between two datasets sounds easy, but in fact the problem is extremely difficult, explains Dr. James McCaffrey of Microsoft Research. A fairly common sub-problem in many machine learning and data science scenarios is the need to compute the similarity (or difference or distance) between two datasets. For example, if you select a sample from a huge set of training data, you likely want to know how similar the sample dataset is to the source dataset. Or if you want to prime the training for a very deep neural network, you need to find an existing model that was trained using a dataset that is most similar to your new dataset. At first thought, computing the similarity/distance between two datasets sounds easy, but in fact the problem is extremely difficult. If you try to compare individual lines between datasets, you quickly run into the combinatorial explosion problem -- there are just too many comparisons.
How To: Create a Streaming Data Loader for PyTorch -- Visual Studio Magazine
When training data won't fit into machine memory, a streaming data loader using an internal memory buffer can help. Dr. James McCaffrey shows how, with full code samples. When using the PyTorch neural network library to create a machine learning prediction model, you must prepare the training data and write code to serve up the data in batches. In situations where the training data is too large to fit into machine memory, one approach is to write a data loader that streams the data using an internal memory buffer. This article shows you how to create a streaming data loader for large training data files.
Data Prep for Machine Learning: Encoding -- Visual Studio Magazine
Dr. James McCaffrey of Microsoft Research uses a full code program and screenshots to explain how to programmatically encode categorical data for use with a machine learning prediction model such as a neural network classification or regression system. This article explains how to programmatically encode categorical data for use with a machine learning (ML) prediction model such as a neural network classification or regression system. Suppose you are trying to predict voting behavior from a file of people data. Your data might include predictor variables like each person's sex (male or female) and region where they live (eastern, western, central), and a dependent variable to predict like political leaning (conservative, moderate, liberal). Neural networks are essentially complex math functions which work with numeric values. Therefore, categorical predictor variables and categorical dependent variables must be converted to a numeric form.
Data Prep for Machine Learning: Missing Data -- Visual Studio Magazine
Turning his attention to the extremely time-consuming task of machine learning data preparation, Dr. James McCaffrey of Microsoft Research explains how to examine data files and how to identify and deal with missing data. Preparing data for use in a machine learning (ML) system is time consuming, tedious, and error prone. A reasonable rule of thumb is that data preparation requires at least 80 percent of the total time needed to create an ML system. There are three main phases of data preparation: cleaning, normalizing and encoding, and splitting. Each of the three main phases has several steps.
Neural Network L1 Regularization Using Python -- Visual Studio Magazine
The data science doctor continues his exploration of techniques used to reduce the likelihood of model overfitting, caused by training a neural network for too many iterations. Regularization is a technique used to reduce the likelihood of neural network model overfitting. Model overfitting can occur when you train a neural network for too many iterations. This sometimes results in a situation where the trained neural network model predicts the output values for the training data very well, with little error and high accuracy, but when the trained model is applied to new, previously unseen data, the model predicts poorly. There are several forms of regularization.
Neural Anomaly Detection Using Keras -- Visual Studio Magazine
An advantage of using a neural technique compared to a standard clustering technique is that neural techniques can handle non-numeric data by encoding that data. Anomaly detection, also called outlier detection, is the process of finding rare items in a dataset. Examples include finding fraudulent login events and fake news items. Take a look at the demo program in Figure 1. The demo examines a 1,000-item subset of the well-known MNIST (modified National Institute of Standards and Technology) dataset.
C# Makes GitHub's Top 5 Machine Learning Languages List -- Visual Studio Magazine
Although Python is the widely recognized de facto, go-to programming language for machine learning and many other artificial intelligence projects, a new study shows C# is holding its own in the space. GitHub mined its extensive internal data to publish a report on all things related to machine learning in its software development platform/open source code repository. The data-based treatise builds on the huge State of the Octoverse 2018 report published last October by the open source champion now owned by Microsoft. The GitHub community consists of more than 31 million developers and more than 2.1 million organizations, hosting more than 96 million repositories. Yesterday, the company published The State of the Octoverse: Machine Learning, which noted the popularity of machine learning/data science projects in the big October report that prompted the company to explore that topic in greater detail.
Visual Studio Live! Keynote: Artificial Intelligence Makes Visual Studio Smarter -- Visual Studio Magazine
Microsoft's Visual Studio team has long made each release of the IDE faster, more functional, less buggy and so on, but now, with the help of artificial intelligence and machine learning, it's actually getting smarter. That was a main takeaway from today's keynote session at Visual Studio Live! in Chicago, where Microsoft's Amanda Silver detailed the latest enhancements to the IDE and what's coming soon. The Director of Program Management for Developer Tools featured AI and ML throughout her presentation -- titled "The Present and Not Too Distant Future of Visual Studio" -- which covered advancements in personal productivity, IntelliCode, DevOps and more. "Our mission with Visual Studio is really to provide the best-in-class tools for any developer building any kind of app," Silver told the jam-packed VSLive! "Today, Visual Studio stands for a ton more. The core of the vision is any developer, any app on any platform."